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The Galaxy is suspected to contain hundreds of millions of binary white dwarf systems, a large 
fraction of which will have sufficiently small orbital period to emit gravitational radiation in band for 
space-based gravitational wave detectors such as the Laser Interferometer Space Antenna (LISA). 

LISA’s main science goal is the detection of cosmological events (supermassive black hole mergers, 
etc.) however the gravitational signal from the galaxy will be the dominant contribution to the data 
- including instrumental noise - over approximately two decades in frequency. The catalogue of 
detectable binary systems will serve as an unparalleled means of studying the Galaxy. Furthermore, 
to maximize the scientific return from the mission, the data must be ‘ cleansed 5 of the galactic 
foreground. We will present an algorithm that can accurately resolve and subtract > 10000 of these 
sources from simulated data supplied by the Mock LISA Data Challenge Task Force. Using the 
time evolution of the gravitational wave frequency, we will reconstruct the position of the recovered 
binaries and show how LISA will sample the entire compact binary population in the Galaxy. 


I. INTRODUCTION 

With the direct detection of gravitational waves (GWs) 
poised to be made this decade, the long-awaited dawn of 
gravitational wave astronomy is near. Similar to tra- 
ditional photon astronomy, different bands of the GW 
spectrum offer unique channels of information about the 
Universe. High-frequency gravitational waves are the tar- 
get of ground based interferometers, and the LIGO/Virgo 
collaboration [1,2] have developed astonishingly sensitive 
observatories which are poised to bring GW astronomy 
out of its infancy. 

While small wavelength gravitational waves are, obser- 
vationally, the most readily accessible, the richest signal 
space in the GW universe is at frequencies far too low for 
Earth-based instruments. A space-borne interferometer 
is the only foreseeable way of reaping the bounty of infor- 
mation transmitted at longer wavelengths, and allowing 
gravitational wave astronomy to reach its full potential. 

Unique to the mHz frequency range is the existence of 
known sources, comprised of close binary star systems in 
the Galaxy, mostly the AM CVn-type binary stars [3]. 
These individual objects, discovered electromagnetically, 
are the near-by representatives of a much larger popula- 
tion of low frequency gravitational wave sources on our 
cosmic doorstep. Population synthesis models for the 
galaxy predict some 60 million binary star systems emit- 
ting gravitational radiation at frequencies between 0.1 
and 10 mHz [4-6]. Because gravitational waves interact 
very weakly with intervening matter, a staggeringly large 
number of these objects, distributed throughout the en- 
tire Galaxy, are potentially observable. 

Maximizing what can be gleaned from mHz GW data 
will require solving unique analysis problems in gravi- 
tational wave astronomy. In particular, the challenges 
posed by these galactic binaries are the sheer number 
of sources, and the degree with which they overlap one 
another in signal space, smearing together to form a con- 


fused blend of gravitational wave power which, depend- 
ing on details of the detector, can be orders of magni- 
tude larger than the instrumental noise floor. The total 
number of binaries which are individually resolvable out 
of this population is unknown and poses a very large- 
dimension model selection problem. The harm in over- 
fitting the data, (i.e, tolerating large false alarm proba- 
bilities) or under-fitting the data (accepting large false 
dismissal rates) not only affects the science that can be 
done with the catalogue of resolved signals, but can also 
impact the data analysis efforts for more distant sources 
of gravitational radiation which share the same band- 
width. 

To prepare for this challenge, we have set out to build 
a “detection pipeline” which can automatically solve ev- 
ery facet of the galactic binary detection problem. To 
wit, we need to locate candidate sources in the data, 
select for the most parsimonious number, accurately es- 
timate the physical parameters that describe the system, 
and cleanly regress the sources from the data. While the 
analysis software we have built is flexible with regards 
to the details of the instrument, we use the Laser In- 
terferometer Space Antenna (LISA), a joint NASA/ESA 
mission concept [7]. This choice was driven by our in- 
tention of participating in round four of the Mock LISA 
Data Challenges (MLDC) [8]. MLDC datasets are re- 
leased in pairs: one coming with the list of signal pa- 
rameters within the data, and one without (henceforth, 
the “training data” and “blind data 7 , respectively). For 
this study, we will focus on analyzing the training data in 
which all other types of sources have been removed, leav- 
ing behind only the galactic binaries and the instrument 
noise. The capabilities of the algorithm on this reduced 
dataset will serve as a realistic demonstration of what we 
could achieve on the blind data, as the only cosmolog- 
ical sources which can impact the number of resolvable 
binaries are the brightest of the binary black hole merg- 
ers and cosmic strings, neither of which pose a serious 
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challenge to existing search algorithms. 

The challenge presented by the galactic foreground has 
been addressed with iteratively more sophistication in 
past MLDCs [9]. Challenge 2 was the first to simulate 
a complete galaxy and in response to this, Crowder and 
Cornish developed the BAM algorithm [10] which, to date, 
has been the most successful demonstration of galactic 
binary data analysis. The BAM codes themselves, which 
have been lost to the sands of time, did not model the 
evolution of a binary’s orbital period (nor did Challenge 
2). Other existing algorithms which have worn their teeth 
on the MLDC data sets include an F-statistic maximiza- 
tion scheme using a Nelder-Mead simplex algorithm [11], 
a hierarchical cleaning algorithm (also employing the F- 
statistic) [12] and an MCMC search algorithm featuring 
a Markovian delayed rejection proposal distribution [13]. 
Beyond MLDC entries there have been numerous proof of 
principal studies including (but not limited to), Refs. [14- 
16]. 

Our goal for this paper is to build a new analysis 
pipeline to process the signal from the population of 
galactic binaries. We are, in effect, attempting to ex- 
tend the BAM algorithm by including frequency evolution 
in our model of the waveforms, and incorporating tech- 
nical advancements made in GW data analysis since the 
second round of the MLDC. 

The paper is organized as follows: In §11 we outline 
the basics of the data analysis methodology used in this 
work. In §111 we describe our model for the data, both 
for the GW signals as well as the instrument noise. The 
search algorithm is spelled out in §IV, and results from 
the MLDC challenge 4 training data are discussed in §V. 
We close in §VI with discussion of future work which 
will utilize the bright source catalogue as a novel tool for 
Galactic astronomy. 


II. DATA ANALYSIS BASICS 


There are two desired products of the data analysis 
procedure. For each model of the data (M) under con- 
sideration, we need to find the “best fit” model param- 
eters and have some sense of how well these parameters 
are constrained, as well as determine which model is most 
strongly supported by the data. 

The mathematical foundation on which the data anal- 
ysis theory is built is Bayes’ theorem which, when cast 
as a tool for solving inference problems, takes the form 


p{s\9,M)p(d\M) 
m ’- M) MM) • 


( 1 ) 


The left hand side of equation 1 is the posterior distri- 
bution function (or ‘ posterior” for short) for parame- 
ters 0 given the data s, from which parameter estimation 
conclusions can be drawn within model M. (see, for in- 
stance, [17] for a thorough introduction). The numerator 
of the right hand side contains the product of the likeli- 
hood p(s|0, M) - our ‘ goodness of fit” measure - and the 


prior, encoding our knowledge before the new data were 
collected. The denominator is the evidence, or marginal- 
ized likelihood for M . Comparisons of p(s\M) between 
different models reveal which is most strongly supported 
by the data and, assuming uninformative priors on the 
models themselves, should be taken as the preferred rep- 
resentation. 

In the Bayesian framework, one needs only to define 
the likelihood and prior distributions, and the rest of the 
analysis is reduced to an oft time-consuming calculation. 
We prefer the Markov Chain Monte Carlo (MCMC) fam- 
ily of algorithms to perform the calculations, although 
Nested Sampling, and its offspring MultiNest, have been 
used to similar affect (Refs. [18, 19]). There is no short- 
age of data analysis literature describing the concept of 
MCMCs. However, for the sake of introducing vocabu- 
lary and notation, we briefly sketch the algorithm. 

The first ‘link” in the Markov^ chain is some ran- 
dom position in parameter space 0 X for which we have 
evaluated the likelihood p(s\9 Xi M) and prior probabil- 
ity p(O x \M) . From there, a new trial position, 9 y , is 
drawn from a proposal distribution q{0 y \0 x ). The likeli- 
hood and prior probability are evaluated for 9 y , and it is 
adopted as the next sample in the chain with probability 

where Hg^ is the Hastings 

ratio 

H _ _ = p(s|l9 i ,,Af)p(g 3/ |A^)9(4|^/) 
flx ^ v p{s\6 x , M)p(0 x \M)q(6 v \6 x ) 

If 0 y is rejected, the chain remains at 9 X and a new 
trial position is considered. This process of stochas- 
tically stepping through parameter space repeats until 
some convergence criteria are satisfied. Equation 2 is de- 
rived from the detailed balance condition, which is sat- 
isfied if the probability of being at state 9 X and transi- 
tioning to state G y is the same as being at 0 y and moving 
to 9 X . By fulfilling this condition when adopting new so- 
lutions in the chain, the number of iterations spent in a 
particular region of parameter space, normalized by the 
total number of steps in the chain, yields the probabil- 
ity that the model parameters have values within that 
region. 

The choice of q(0 v \9 x ) y by construction, does not al- 
ter the recovered posterior distribution function. The 
proposal distribution does, however, dramatically affect 
the acceptance rate of trial locations in parameter space 
and, therefore, the number of iterations required to sat- 
isfactorily sample the posterior. Considering Eq. [? ], 
and ignoring the proposal distribution ratio, the algo- 
rithm will accept moves to positions in parameter space 
which have higher posterior weight like any good “hill- 
climbing” search algorithm. The novelty of the MCMC 
is its willingness to adopt a worse fit solution, mitigating 
the potential of getting trapped by local features of the 
posterior. Despite this built in attribute of the algorithm, 
extremely multimodal distributions can be a detriment 
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to the efficiency of the sampler. The posteriors for typi- 
cal gravitational wave sources have been found to exhibit 
substantial sub-dominant modes which can be too diffi- 
cult for a straight-forward MCMC to overcome in a rea- 
sonable amount of time. To help alleviate the challenge 
posed by these distributions, we include parallel temper- 
ing [20] - a set of chains, running simultaneously, each 
at a higher “temperature” - as is becoming standard in 
GW applications of MCMCs, e.g. [21-23]. 

For the model selection facet of the problem, we use 
a trans-dimensiona) (or “reverse jump”) Markov Chain 
Monte Carlo (RJMCMC) [24, 25] as the tool for sam- 
pling the target posterior distribution function in model 
space. The RJMCMC stochastically moves between 
models while satisfy ing detailed balance, so the number 
of iterations the chain spends in a particular model is pro- 
portional to the marginalized likelihood for that model. 
This class of MCMC algorithm has previously been used 
to study the galactic binary detection problem on toy 
problems [26] and smaller data sets [15, 16], but has yet 
been turned loose on data containing a complete simula- 
tion of the GW signals from the entire Galaxy. 

The MCMC algorithm provides the machinery to em- 
ploy Bayes' theorem to our data analysis problem. Prop- 
erly interpreting the results from the MCMC requires 
detailed understanding of our model for the data, par- 
ticularly the definition of the likelihood function. In the 
following section we will explicitly spell out our construc- 
tion for the data, including the waveform and noise mod- 
els, the likelihood function, and the priors used in the 
analysis. 


III. THE DATA MODEL 

We model the data s as having two contributions: 

M 

s* = *«(?«)+£ /£(*)• (3) 

» 

The instrument noise, n, is assumed to be stationary and 
gaussian with colored spectral density parameterized by 
rf K . The gravitational wave component of the data is 
the superposition of M gravitational wave templates in, 
each parameterized by A which will be described in de- 
tail later. The subscript k denotes the different interfer- 
ometer channels synthesized from the LISA phase-meter 
data. We use the usual noise orthogonal AET chan- 
nels [27]. Because we expect all of the galactic binaries 
to be well below the LISA transfer frequency /* = c/2nL 
we can neglect the T channel which is, in effect, GW-free 
for / < /*. 


1. The waveform model 

Galactic binaries in the LISA band are expected to 

exhibit relatively little frequency evolution during the 


lifetime of the mission. Thus, the phase of the GWs 
emitted from the binary can be safely approximated as 
$(£) = ipo + 27t/ 0 £ +7t/o£ 1 2 + ... where higher order deriva- 
tives of / can be neglected for binaries below ~ 9 mHz 
during a five-year-long mission [28]. For this work, we 
set fo and all higher derivatives of / to 0, as is the case 
in the MLDC data simulations. 

Given these assumptions about the phase evolution of 
the binary, as well as restricting the templates to circular 
orbits (perhaps a dubious constraint, see Ref. [29]), We 
can fully describe a GB waveform with eight parameters: 

A -¥ (/o, fo, o, <j>, Ao, i, i > , ¥>o) (4) 

where the subscript 0 indicates the value taken at the 
first time-sample in the data. Parameters { 6 , <f>} describe 
the sky-location of the binary in ecliptic coordinates, and 
{ l , ^o} are angles that fix the orientation of the binary. 
The amplitude 


2 M b > 3 (tt/ 0 ) 2/3 
D l 


(5) 


couples the chirp mass M and luminosity distance Dl 
preventing the independent measurement of either quan- 
tity. If the binary orbital evolution is driven purely by 
the emission of gravitational waves (as opposed to, for in- 
stance, mass transfer between the individual stars in the 
binary), then the linear term in the frequency evolution 
depends only on the frequency and chirp mass via: 

/ = ^ 8 / 3 ^ 573 ^ 1/3 (6 ) 

Sources which satisfy this condition will henceforth be 
referred to as detached binaries. For this data set, we 
have the advantage of knowing that any binaries with 
fo > 0 are detached, allowing us to make a determination 
of D l . When analyzing data from the Galaxy itself, we 
will not have this foresight. For the rare cases where 
fo can also be measured, systems being driven only by 
the emission of gravitational radiation must satisfy the 
braking index condition: 


n — 


ft 

f 2 ' 


(7) 


LISA’s ability to measure /o, and how ignoring this pa- 
rameter impacts the data analysis, has been preliminarily 
explored in [28] and [30]. We will address this important 
detail in a follow-on study [31]. 

To compute the instrument response to a particular 
galactic binary signal we use the fast-slow decomposition 
as detailed in the appendix of [32]. 

We utilize a prior on the location of any given bi- 
nary constructed from the number density of stars in the 
galaxy. Our model for the galactic distribution is similar 
to that from which the MLDC datasets were drawn [5, 6]. 
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The density profile has two components, one from the 
disk and one from the bulge: 


Pbulge ~ 



e 



Pgalaxy ~ Apbulge (1 — ^)pdisk 


(8) 


where r\ c = x\ c + y\ c + z\ c , u\ c = x\ c + V 2 gc > and 
{x gc , 2/gc? ^gc} are the cartesian galactic coordinates of 
the source. For our purposes, the parameters of the 
galaxy distribution used are [33] 


{Ri,Rd,Z(i,A} = {690 pc, 2520 pc, 302 pc, 0.24} (9) 


Using this distribution we build a joint prior 

P(/o, /o? 0, 05 *^o) = ^Pgalaxy (10) 

with a complicated normalization constant C that we 
approximate by Monte Carlo integration over the prior 
volume. The quantities not constrained by this prior are 
the orientation parameters, which we take as having uni- 
form a priori distributions over [0, 27r] for 0 and <po> and 
[—1,1] for cost. 

To utilize the prior in Eq. 10 for templates of detached 
systems we determine the distance to the binary using 
Eqs. 5 and 6. For mass-transferring binaries we have 
no way of making such a determination without better 
understanding of the orbital dynamics. We construct a 
separate prior for such systems, marginalizing Eq. 10 over 
/o, /o- and Aq. We are left with a prior on {0,0} only, 
and adopt uniform distributions on the marginalized pa- 
rameters. 


2. The noise model 


Given nominal levels for the shot- and acceleration- 
noise ( S 8 and j S a ), the baseline noise power spectral den- 
sity for the LISA A and E channels is 


S n {f) = | sin 2 j- (2 + cos Ej S s 
+ 2( 3 + 2co 


• (11) 


To this we must add an estimate of the confusion noise 
S c which is derived from data simulations 




r 2Q—44.8 ^—2.4 

20-47.15^3.1 

i lO" 51 /" 4 ' 4 
10- 74 ' 7 /- 13 
20-59.15^-7 


10- 

-4 

<} 

r < 4.5 

X 

10“ 

•4 

4.5 

X 

10- 

4 </ 

< 

1.1 

X 10" 3 

1.1 

X 

10- 

“ 3 </ 

< 

1.7 

00 

1 

O 

X 

1.7 

X 

10 

- 3<f 

< 

2.5 

x 10" 3 

2.5 

X 

10- 

- 3 <f 

< 

4 x 

10~ 3 


( 12 ) 

To allow for modeling error in the noise levels, and the 
vagaries of the particular noise realization in the data, we 


include parameters which characterize departures from 
this theoretical noise power spectral density as described 
in [15]. A separate noise level is defined for each of several 
narrow bandwidth segments of data, each of length 3 
frequency bins. The i th segment is rescaled as S n (f) — > 
ViSn(f)- For Gaussian noise, the expectation valu e* for 
77 when measured over N^b bins is a* = 1/y/N^B- We 
accommodate for additional ignorance with respect to 
the noise level by using a normal distribution N[1,A^] 
as the prior on each 77* . 


3. The likelihood function 


With the noise and signal models now declared, the 
likelihood p{s\6) is computed over N Fourier bins, built 
from the assumption of colored Gaussian noise where 
the noise power spectral density is being fit over N seg = 
N/Ntf b narrowband segments of data, via: 


lnp(s|0) = 



^8 eg 

(r*K) + Nnb lnr? " 


n 


(13) 


The residual 


M 


r K = s K -j^hi(y) 


(14) 


appears in equation 13 inside of the noise- weighted inner 
product defined as 



s m (f)i(f) + a(/)fr(/) 

v(f)s n (f) 


(15) 


where, as described above, 77 takes the same value over 
Nnb Fourier bins. 


IV. THE ALGORITHM 


A. Overview 

In this section we will describe each phase in the 
pipeline. Following a coarse overview of the entire proce- 
dure, a detailed, step-by-step description of the algorithm 
can be found in the subsections. 

The data are first divided into small bandwidth sub- 
sets, each of which is independently analyzed. In each 
window, the analysis goes through roughly three phases: 

• Search : To locate the regions of high probability in 
parameter space. 

• Characterization : To globally sample the posterior 
distribution for the model parameters (parameter 
estimation). 
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• Evaluation : To calculate model evidence and de- 
termine which of the models are favored (model 
selection). 

In this paper, the Search phase is subdivided into a 
“burn-in” phase where we use a reduced parameter space 
to look for modes in the distribution, and a full parameter 
search that is tuned to move between the modes in order 
to identify the region in parameter space encompassing 
the global maximum likelihood. 

All of our Markov chain runs use a variety of proposal 
distributions, from uniform proposals over the full prior 
volume, to small jumps along the eigenvectors of the co- 
variance matrix estimated by the Fisher information ma- 
trix [15]. 

Each data window is studied iteratively. At each it- 
eration, we include an additional template in the model 
until we reach the maximum evidence. We apply a vari- 
ety of tricks to increase the efficiencv during the Search 
phase, many of which spoil the statistical properties of 
the chain. The “illegal” search chains are used to pro- 
duce proposal distributions for the subsequent Charac- 
terization/Evaluation phases, where we take care to sat- 
isfy detailed balance. Parameter estimation and model 
selection are performed simultaneously, as v e include the 
number of templates in the model as a parameter. For 
example, during iteration / the RJMCMC is allowed to 
move between models containing 0 < i < I templates. 
The model in which the RJMCMC spends the most it- 
erations, M \ nax , is the one with the highest evidence. If 
fcmax < I then that window is finished being analyzed, 
and the MAP parameters from model M are stored 
in the master list for the full dataset. 


B. Preparing the Data 

The bandwidth of a typical galactic binary waveform 
is sufficiently small that the data can be divided into 
small subsets, or windows, each of which spans a rel- 
atively small range in frequency, and each window can 
be analyzed independently. For practical purposes, this 
makes it simple to scale the analysis code from a test- 
ing platform to running on the full dataset. While many 
CPUs are required to process all of the data, they do not 
need to talk to one another while doing so. 

The galactic binary search will be hindered at low fre- 
quency by power from high SNR black hole binaries in 
the data, as well as bursts from cosmic strings. Both 
the black hole binaries and the cosmic strings have very 
unique time- frequency characteristics. This means the 
brightest sources can be cleanly removed from the data 
without marring the signal from the galaxy. The accu- 
rate detection of black holes has been a main theme of 
LISA science studies and is definitely a manageable task, 
while the high-accuracy detection of the cosmic strings 
has been successfully demonstrated in the previous round 
of the MLDC [34], We do not see the removal of black 
hole or cosmic string sources as a substantial technical 


challenge, and so have focused the efforts in this paper 
on training data with only instrument noise and the sig- 
nal from the galaxy. For the blind data analysis to follow 
this work, a collaborative effort will be needed to perform 
this first cleaning step'. 

In each window, care needs to be taken at the edges 
as templates will try to fit signals from adjacent data 
segments which have power “leaking” into that which is 
being analyzed. Thankfully this was addressed in the BAM 
algorithm by having a smaller acceptance region within 
each window where the initial frequency must fall if the 
source is going to be taken as a true detection. Bordering 
this acceptance region to the edges of the window are 
the “wings” of the data segment where templates are 
included in the model but the sources they recover are 
not stored as detections. Adjacent windows are tiled so 
that the end of one acceptance region is the beginning of 
another. Thus we have full coverage of the data without 
any overlap between acceptance regions, and without the 
risk of double-counting sources that happen to lie at the 
interface of two data segments. Figure 1 shows a cartoon 
depiction of a data window. 

The size of the windows is not something that can 
be fixed for all signals. The amplitude of a galactic 
binary waveform scales as / 2 / 3 , meaning the discrete 
Fourier transform of a signal typically has significant 
power across a larger bandwidth as we move to higher fre- 
quency data. Furthermore, for detached binaries (which 
make up the bulk of the galactic population) / scales 
as / n /3 go signals with high initial- frequency typically 
spread their power over more bins during the course of 
the observation. 

Because of this frequencv-dependent bandwidth, the 
size of the wings and, for efficiency’s sake, the size of the 
windows, is frequency dependent. 


C. The Search Phase 

The purpose of the search phase is to rapidly locate 
the sources in the data (i.e., the modes of the posterior 
distribution function). Here we are not concerned with 
satisfying detailed balance, or producing samples which 
represent the posterior, but instead are focused on effi- 
ciency* The two most substantial cost saving enhance- 
ments come from reducing the dimension of the search 
by maximizing the likelihood over “extrinsic parameters” 
(orientation and distance) using the F-statistic [35, 36], 
and by making the signals a bigger target in the search 
space through simulated annealing. 

Simulated annealing is another common trick per- 
formed when using Markov Chain Monte Carlo-like 
methods to rapidly locate the modes of the distribution. 
It works by initially suppressing the influence of the like- 
lihood terms in the Hasting’s ratio (Eq 2) by “heating” 
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FIG. 1: A cartoon depiction of a data window to be 
independently searched. The shaded portions are where 
detected binaries will be excluded from the master list. 
All templates within the same Signal Block are updated 
simultaneously. One parameter 77 is used to measure the 
noise PSD over each Noise Block. This concept was 
derived for the BAM algorithm and adapted from 
Ref. [10] 


the distribution being searched 

MAI) , / ?(*&) v* , 16) 

p(s\o x ) \p(a\^~)J 

with the exponent playing the role of an inverse “tem- 
perature” 0 = 1 /T with 0 < 0 < 1. Early iterations of 
the chain are run at high temperature (small 0) forcing 
the likelihood ratio term to ~ 1 . The influence of the 
likelihood is gradually increased as the search space is 
gradually “cooled” at each iteration i via 

0 = 

until 0 goes to 1 and the chains are sampling the target 
distribution. 

Simulated annealing requires some tuning in order for 
it to actually improve the search’s efficiency. The ad- 
justable parameters to the annealing scheme, the maxi- 
mum temperature T max and the cooling time r, need to 
be custom suited for each problem. Conceptually, what 
we want is for T max to be high enough that chains are 
able to freely explore the full prior range, while not being 
so absurdly hot that we spend many iterations with no 
hope of locking on to any of the signal. A reasonable rule 
of thumb is that the effective SNR of the signals as seen 
by the tempered chain is reduced from the true SNR by 
a factor of ~ 1 /y/T. To this end, we want T max to be 
~ SNR 2 , and can reasonably determine it based on the 


! 


Tmax ) T “«“ 0 - 1 < T (17) 


i > r 


excess Fourier power in the data: 

T m ^=(s\s)-4N. (18) 

Setting the cooling rate r is an exercise in trial and error, 
and depends on the bandwidth of the data window, with 
higher frequencies warranting longer cooling times. 

In addition to simulated annealing, and in all other 
phases of the analysis, we use the now commonplace 
method of “parallel tempering” where multiple chains 
are run simultaneously at different temperatures, with 
exchanges of parameters between chains subject to the 
detailed balance condition. 

To further increase the efficiency of the search, we wish 
to reduce the volume of the search space. For this pur- 
pose, the F-statistic is a tool which has proven extremely 
useful in LIGO/ Virgo searches [37], and, in a LISA con- 
text, proof-of-principle data analysis black holes [38], and 
galactic binaries [10, 12, 36], Because of it’s frequent use 
in the GW data analysis literature, we leave the details 
to the aforementioned references. 

While the speed-up in the search time when using the 
F-statistic is substantial, once the (approximately) best- 
fit frequency and sky location for each source in the 
model have been located, it becomes a liability. For a 
model containing Ns sources, a single F-statistic eval- 
uation involves 4N$ calls to the waveform generator as 
well as a 4 Ns x 4 Ns matrix inversion, ultimately costing 
more than 4 Ns likelihood evaluations. Therefore, once 
the F-statistic has done its job and found the modes, the 
chains are much more efficient reverting to the Gaussian 
likelihood as described in §11. The points of the chain 
from the F-statistic search are discarded as the burn-in 
samples. 

At this point we are interested in producing an en- 
semble of samples that approximate the target posterior 
distribution function. However, we are not yet ready to 
abandon some of our cost saving measures in favor of de- 
tailed balance. The posteriors for these signals are mul- 
timodal and in a few percent of trial runs, the burn-in 
phase ends on a secondary maximum of the distribution. 
The nature of these near-degeneracies was explored with 
detail in [10]. To summarize, the orbital motion of the 
LISA constellation, as well as the finite number of data 
points, imparts a harmonic structure to the waveforms 
on either side of the initial frequency of the binary. The 
templates can fix themselves to a sub-dominant harmonic 
while still achieving overlaps with the injected signals of 
> 70%. 

Such features in the posterior expose the weaknesses 
of an “out of the box” MCMC. While a generic MCMC 
chain is guaranteed to eventually converge, in most cases 
we are not willing to wait long enough. While there are 
certainly more elegant ways of overcoming these types of 
challenges, we offer a brute- force approach inspired by de- 
layed rejection [39, 40]. We propose some position in pa- 
rameter space By and, without asking the Hastings ratio 
for any input, temporarily adopt this position and evolve 
the chains from there, searching for a nearby point with 



7 


higher likelihood. After some fixed number of updates 
the chain arrives at 0 f y . We then look back and calculate 
the transition probability ag, ^ using the Hasting’s ra- 
tio in Equation 2. This transition probability does not 
satisfy the detailed balance condition, and so the sam- 
ples from the chain will be biased in some way. For a lot 
of effort, delayed rejection performs this type of explo- 
ration while preserving detailed balance as described in 
detail within a gravitational wave data analysis context 
by Trias et al in Ref [13]. For our purposes, we are not 
yet concerned with the statistical properties of our chain 
and accept the fact that our intermediate distributions 
will be biased. 

Our lazy implementation of delayed rejection is per- 
fectly suited to prevent us from sticking on a sub- 
dominant harmonic of the waveform. The secondary 
modes appear at integer multiples of the LISA modu- 
lation frequency f m = 1/year. We propose a shift in 
frequency by some rc/ m , where n is a random integer 
drawn from U[-6,6], adopt that solution, and then allow 
the chains a few iterations to refine the remaining param- 
eters (in particular, the sky location) before comparing 
back to the current solution. This addition to the search 
procedure dramatically reduces the instances of recov- 
ered sources having the central frequency on a harmonic 
induced by the orbital motion. 

Figure 2 is an instructive depiction of the challenging 
multimodal structure of the posterior. We show a scatter 
plot of chain samples in the p(s|0) — / plane. The plot is 
centered at the true frequency of the injected source. The 
sub-dominant modes of the distribution are clearly visi- 
ble as peaks in the likelihood surface occurring at even 
integer multiples of f m T = 2 Fourier bins. In this exam- 
ple, the search chain approached from higher frequency 
which is why we only see sub-dominant modes to the right 
of the peak. In general, these distributions are (roughly) 
symmetric about the best-fit value of /o. A single Markov 
chain without the benefit of delayed rejection or paral- 
lel tempering would be severely challenged by these local 
features of the posterior distribution function. Missing 
the global maximum will not only produce poor parame- 
ter estimation for that source, but will also leave behind 
a coherent residual to which subsequent iterations will 
attempt to fit. Again, we stress that the (RJ)MCMC al- 
gorithm can overcome these challenges without any help, 
but the convergence time is often impractically long. If 
chains are not mixing well, the potential for sticking to a 
local feature of the posterior is increased, and the chances 
of subsequent iterations rectifying that error during the 
model selection phase becomes vanishingly small. 

D. The Characterization &: Evaluation Phase 

Upon the completion of the search phase, the chains 
have produced samples of some biased distribution func- 
tion, the modes of which closely correspond to that of 
the target posterior. To accurately refine the characteri- 
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FIG. 2: Delayed Rejection at work: This scatter plot 
shows samples from a search chain in the p(s|0) — / 
plane. The plot is centered at the true frequency of the 
injected source. Delayed rejection enables the chain to 
efficiently leave a secondary maximum in favor of the 
true source parameters. 


zation of the model and to evaluate wether or not it is the 
one favored by the data, we run a RJMCMC following all 
of the rules to ensure the chain samples are representa- 
tive of the target posterior. RJMCMCs are notoriously 
difficult to work with in high-dimension problems. The 
proposal distributions have to be informative enough that 
drawing a random point out of (for this example) an 8 
dimensional parameter space produces a reasonable fit to 
the data, but are not so constraining that the improved 
fit to the data is overly penalized by how strongly vou 
forced the trial solution to that point. For these tech- 
niques to work efficiently, the chains can not tolerate a 
proposal that is anything but a good approximation to 
the posterior. 

Fortunately for us, we spent the search phase produc- 
ing such an approximation. While these biased chains 
could not be used for 'model selection or parameter es- 
timation, there are no rules against employing them to 
build suitable proposal distributions. This concept was 
originally suggested by Green [25] and we have described 
in detail the procedure applied to this work in Ref. [15]. 
In short, we bin the chains into an 8D grid using fisher 
matrix estimates of the parameter variances to set the 
cell size in the grid. The number of samples from the 
chain that land in a given cell is proportional to the prob- 
ability density in that cell. The proposal distribution 
randomly chooses a cell weighted by its probability, and 
then uniformly draws within the cell to come up with the 
parameters. Recently, Farr and Mandel [41] introduced 
an improvement on how to bin the chain samples by us- 
ing a kD-tree data structure instead of our Fisher-scaled 
grid. It is clear that this will further improve the effi- 
ciency of these proposals by eliminating the dependence 
on the grid size, something which we had to carefully 
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tune. We will transition to the kD-tree decomposition in 
future work. 

As mentioned previously, this analysis is performed it- 
eratively, including an additional signal in the model at 
each iteration. During the characterization phase of the 
procedure on iteration I the RJMCMC is exploring mod- 
els containing anywhere between 0 and / signals. The 
model with the strongest support is the one in which the 
RJMCMC spends the most iterations. When two models 
are similarly supported by the data a more subtle selec- 
tion has to be made. Between any two models we can 
calculate the Bayes factor 

_ p{s\Mj) _ # of iterations in Mj . 
p(s\Mi) # of iterations in Mi 

which is easily interpreted as the odds (ignoring prior 
preferences for one model over another) that Mj is pre- 
ferred over Mi . Thus, Bij ~ 1 means that both models 
are similarly supported by the data. While that is a 
nice interpretation, when assembling a catalogue of de- 
tectable galactic binaries and producing a residual fit for 
subsequent searches, decisions need to be made about 
which model to pick in these marginally distinguishable 
cases. This, sadly, somewhat reduces the Bayes factor 
to a statistic used for model selection. Nevertheless, we 
have to draw a line in the sand, and have chosen for this 
study a Bayes factor “threshold” of 12:1 needed to pre- 
fer a higher dimensional model, above which the support 
for M j is canonically considered “strong” [42] . A. more 
satisfying thing to do would be to repeat the analysis for 
several different Bij cutoffs, producing appendices to the 
final source catalogue of more speculative detections. 

If the highest dimension model is supported by the 
data, we store the map parameters and begin another 
iteration. If not, the MAP parameters from the winning 
model are stored and no more iterations on that win- 
dow are performed. Only sources with initial frequency 
inside the acceptance region are added to the “master” 
list of detections. Figure 3 shows ~ 5 windows’ worth of 
training data, with the best fit waveforms over-plotted. 
A relatively high frequency window was chosen for this 
demonstration so that the different signals in the fit could 
be distinguished. 

E. Source Subtraction 

The biggest cost to performing these MCMC runs is 
the waveform calculation. While our waveform model is 
extremely efficient, there is not much that can be done 
about the colossal number of templates that need to be 
computed to resolve 10000+ sources. To help mitigate 
this expense, we want to hold fixed the waveform param- 
eters for sources which are not significantly overlapping 
other sources in the window. 

After each iteration, if the new source is within some 
pre-defined number of frequency bins (depending on the 
characteristic bandwidth of sources in that window), 



FIG. 3: Power spectral density of a small bandwidth 
segment of the training data. This figure spans ~ 5 
search windows. The green [dashed] line is the original 
data, while the red [solid] line shows the best fit signal 
model. 


than on the following iteration this template, along with 
any others that satisfy the closeness condition, and the 
new source included in the model, are allowed to vary. 
Otherwise, the signals located in previous iterations are 
held fixed. This keeps the number of “active” sources per 
window, per iteration, much lower than the total number 
of detectable binaries in that segment of data. We are 
essentially searching on the residual of the data from pre- 
vious iterations, but with the caveat that if new sources 
get too close to existing detections than all nearby sig- 
nals need to be simultaneously re-characterized. Figure 4 
shows the same data segment as in Figure 3, but with 
the best fit model regressed from the original data. The 
residual is consistent with stationary Gaussian noise at 
the level of the instrument noise. 


V. THE RECOVERED SOURCE CATALOGUE 

We performed a comprehensive test run of the algo- 
rithm on the MLDC challenge 4 training data set, con- 
taining only instrument noise and the signals from the 
simulated galaxy. As this run was to serve as a test of 
the algorithm, the search was not carried to completion 
in all frequency windows. In this section we quantify 
the performance of the algorithm by comparing the re- 
covered catalogue to the source list supplied with the 
training data. 

1. Evaluating the recovered catalogue 

The total number of detected galactic binaries that we 
located in the data before halting the search was ~ 9000. 
The merit of our recovered catalogue was judged using 
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FIG 4: Same as Figure 3, but with the red [solid] line 
replaced by the residual power spectral density after the 
best fit model has been regressed from the data. 


the MLDC challenge 3 evaluation software available as 
part cf the lisatools software package [43]. 

For each recovered source the evaluation software 
searches through the list of (SNR > 1) sources in the sim- 
ulates galaxy, and determines which injected signal /ii n j 
gives :he lowest noise- weighted residual (/i* m j -/i rec |^mj — 
h rec ). Once h rec is paired with the corresponding h m j the 
correlation between the two waveforms 


Corn 


inj,rec — 


(hjnjlferec) 


y/ (hmj|hinj)(h re c|h rec) 


( 20 ) 


is computed and stored in a “report card' 5 for the recov- 
ered catalogue. A histogram of the correlations for our 
recovered population of sources is depicted in Figure 5. 



FIG. 5: Correlations between recovered sources and 
their corresponding signal in the training data. Of the 
~ 9000 detections over 90% had a correlation above 0.9 
with an injected signal. 


Along with the correlation files, the MLDC evaluation 
software saves the parameter error between each recov- 
ered signals and its corresponding source in the data. 
The distribution of errors for the entire catalogue will 
reveal any systematic biases in the parameter recovery. 
Figure 6 shows the distribution of recovered parameter 
biases. We are pleased to report that all parameters show 
a strong peak at zero bias. 

The galaxy search serves the dual purpose of recovering 
a trove of information about the galaxy, and removing a 
substantial amount of foreground signal-power, facilitat- 
ing the search for signals at cosmological distances. The 
residual power spectral density of the training data after 
having removed the recovered signals is shown in Fig- 
ure 7. The dashed [green] trace shows the galaxy-only 
training data (without BHBs, EMRIs, etc.), while the 
solid [red] line is the residual after our recovered cata- 
logue has been regressed from the data. Notice we only 
performed the search out to 0.01 Hz. At higher frequen- 
cies the signals are typically bright and isolated. This 
makes them easy to find, but computationally expensive 
to characterize, as the bandwidth of the signal increases 
with frequency and amplitude. Individual high frequency 
windows have been tested and will be included in our 
blind search, but we omit them from the results here. It 
is clear from the residual that there are more detectable 
sources than the 9000 on which we are reporting. How- 
ever, as a test of the algorithm we take this as a satisfying 
result. 


2. Galactic binary astronomy from the recovered catalogue 


While the focus of this work has been the algorithm 
and its performance on simulated data, the real fun be- 
gins when we brandish the source catalogue as an un- 
precedented astronomical tool. Our follow-on work to 
this paper will address some specifics, but in the mean 
time there is one piece of “low hanging fruit” that is too 
good to pass up. As mentioned in §IH, the inclusion of 
fo in the waveform model presents us with an oppor- 
tunity to disentangle the luminosity distance from the 
overall GW amplitude. To demonstrate this capability, 
we select from the catalogue any binary with SNR>15 
and foT 2 > 1 and compute Dl using Eqs. 5 and 6. The 
resulting galactic map is shown in Figure 8, exhibiting 
the unique capability of low frequency gravitational wave 
detectors to reconstruct a three-dimensional map of the 
entire Galaxy. 

Admittedly, we artificially benefit from knowing that 
any binary with fo > 0 is dynamically evolving under ra- 
diation reaction forces only. How this assumption biases 
the spatial distribution of the detected population is a 
top priority of our subsequent studies. 
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FIG. 6: [Top left] The distribution of SNRs for the recovered signals. [All others] Distributions of the difference 
between the recovered signal parameters and the associated source in the data. We see no significant systematic 
biases between our best fit parameters and true values for the recovered sources. The orientation parameters of the 
binary are typically not well measured, hence the larger tails on the error distributions. 


VI. DISCUSSION 

Using [10] as a foundation, we have developed our own 
galactic binary detection code and tested it on the round 
4.0 training data supplied by the Mock LISA Data Chal- 
lenge Task Force. New features to this analysis algorithm 
are the inclusion of parallel tempering and delayed rejec- 
tion to increase the efficiency of the search, and using a 
trans-dimensional MCMC to simultaneously characterize 
the detected binaries and determine the most likely num- 
ber of binaries in each small-bandwidth segment of data 
being analyzed. We also include the /o parameter in the 
waveform modeling which was not part of the original 
work by Crowder and Cornish, but was included in the 
MLDC round 3 data and subsequent entries. 

The codes were written for the purpose of participat- 
ing in the blind challenge, so the analysis of the train- 
ing data was halted before completion. Before moving 
on to the challenge data, we recovered ~ 9000 sources, 
90% of which matched one of the injected sources with 


correlation greater than 0.90. It is worth noting that 
a smaller-scaie space borne gravitational wave detector 
is more likely to fly in the near future than the full 5 
Gm LISA mission concept. If anything, that makes our 
current search software overqualified to handle whatever 
data we are presented. 

Of course, detecting the sources is the first step to- 
wards using the data to its full potential, as a unique 
probe of the Galaxy. Including the first time deriva- 
tive of the frequency in the waveform model allows us 
to compute the distance to binaries whose dynamics are 
not affected by mass transfer or tidal effects. From the 
recovered source catalogue, we measured the sky-location 
and / well enough to constrain Dl of ~ 1000 binaries, 
sampled from the entire volume of the Galaxy. This 
result hints at the potential for low-frequency gravita- 
tional wave astronomy to offer an unprecedented view of 
how compact stellar remnants are distributed. Implicit 
in the calculation of the distance is the supposition that 
all mass-transferring systems have / < 0. How our abil- 
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FIG. 7: Power spectral density of the training data 
(dashed/green) and residual (solid/red) after removal of 
the recovered sources. The remaining spikes in the 
residual are detectable signals left behind after we 
prematurely halted the search in favor of analyzing the 
blind data. 


ity to map the distribution of the Galaxy is impacted by 
relaxing this assumption will be the focus of follow-on 
work. 
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